The article presents a mathematical proof that transformer language models are injective and thus invertible, countering the belief that non-linear activations and normalization in these models lead to loss of information. It introduces an algorithm called SipIt, which efficiently reconstructs the exact input text from hidden activations, highlighting the implications for model transparency and safe deployment.
injectivity ✓
language models ✓
invertibility ✓